Text Line Extraction in Historical Documents Using Mask R-CNN
نویسندگان
چکیده
Text line extraction is an essential preprocessing step in many handwritten document image analysis tasks. It includes detecting text lines a and segmenting the regions of each detected line. Deep learning-based methods are frequently used for detection. However, only limited number tackle problems detection segmentation together. This paper proposes holistic method that applies Mask R-CNN extraction. A model trained to extract fractions from patches, which further merged form entire page. The presented was evaluated on two well-known datasets historical documents, DIVA-HisDB ICDAR 2015-HTR, achieved state-of-the-art results. In addition, we introduce new challenging dataset Arabic manuscripts, VML-AHTE, where numerous diacritics present. We show R-CNN-based can successfully segment lines, even such scenario.
منابع مشابه
Using Scale-Space Anisotropic Smoothing for Text Line Extraction in Historical Documents
This paper presents a novel approach for text line extraction which is based on Gaussian scale space, a dedicated binarization, and an energy minimization framework. It enhances the text lines in the image using multi-scale anisotropic second derivative of Gaussian filter bank at the average height of the text line. It then applies a binarization, which is based on component-tree and is tailore...
متن کاملProstate segmentation and lesions classification in CT images using Mask R-CNN
Purpose: Non-cancerous prostate lesions such as prostate calcification, prostate enlargement, and prostate inflammation cause too many problems for men’s health. This research proposes a novel approach, a combination of image processing techniques and deep learning methods for classification and segmentation of the prostate in CT-scan images by considering the experienced physicians’ reports. ...
متن کاملText Line Extraction from Complex Layout Documents
There are numerous stylish documents which do not have the traditional text layouts where printed text regions are not parallel to each other. Such complex layouts make text line extraction challenging due to multi-orientation of paragraphs. This paper introduces a system for the text line extraction from the complex layout documents. Proposed method is based on the concept of dilation and hist...
متن کاملText line extraction for historical document images
0167-8655/$ see front matter 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.patrec.2013.07.007 ⇑ Corresponding author at: Department of Computer Science, Triangle Research & Development Center, Kafr Qarea, Israel. Fax: +972 4 6356168. E-mail addresses: [email protected] (R. Saabni), [email protected] (A. Asi), [email protected] (J. El-Sana). 1 These authors contribut...
متن کاملText Extraction from Historical Handwritten Documents by Edge Detection
Many national archives or libraries keep large amount of historical handwritten documents. One problem that many archivists are facing is the sipping of ink through the pages of certain double-sided handwritten documents after long periods of storage. The result is that the handwritten characters from the reverse side appear as noise on the front side and even interfere with the front side char...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Signals
سال: 2022
ISSN: ['2624-6120']
DOI: https://doi.org/10.3390/signals3030032